Weakly Aligned Feature Fusion for Multimodal Object Detection

نویسندگان

چکیده

To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, depth. However, multimodal data often suffer from position shift problem, i.e., image pair is not strictly aligned, making one has different positions modalities. For deep learning method, this problem makes it difficult to fuse features puzzles convolutional neural network (CNN) training. In article, we propose a general detector named aligned region CNN (AR-CNN) tackle problem. First, feature (RF) alignment module with adjacent similarity constraint designed consistently predict between two modalities adaptively align cross-modal RFs. Second, novel interest (RoI) jitter strategy improve robustness unexpected patterns. Third, present new fusion method that selects more reliable suppresses less useful via reweighting. addition, by locating bounding boxes both building their relationships, provide labeling KAIST-Paired. Extensive experiments on 2-D 3-D detection, RGB-T, RGB-D datasets demonstrate effectiveness our method.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-Attentive Feature-level Fusion for Multimodal Emotion Detection

Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We ...

متن کامل

Multi Sensor Fusion for Object Detection Using Generalized Feature Models

This paper presents a multi sensor tracking system and introduces the use of new generalized feature models. To detect and recognize objects as selfcontained parts of the real world with two or more sensors of the same or of several types requires on the one hand fusion methods suitable for combining the data coming from the set of sensors in an optimal manner. This is realized by a sensor fusi...

متن کامل

Feature-Level based Video Fusion for Object Detection

Fusion of three-dimensional data from multiple sensors gained momentum, especially in applications pertaining to surveillance, when promising results were obtained in moving object detection. Several approaches to video fusion of visual and infrared data have been proposed in recent literature. They mainly comprise of pixel based methodologies. Surveillance is a major application of video fusio...

متن کامل

Object-centered Feature Selection for Weakly-Unsupervised Object Categorization

We describe a novel approach of spatio-temporal mapping of local image features, to reduce the number of input data for further object categorization. The main focus of our work is the selection of good features to learn, by achieving a precise mapping of image features either related to static objects or to background. This can be done by initial camera motion estimation, subsequent structure ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE transactions on neural networks and learning systems

سال: 2021

ISSN: ['2162-237X', '2162-2388']

DOI: https://doi.org/10.1109/tnnls.2021.3105143